متن5

4 Advanced NLP and Text Mining

Text Mining makes use of techniques from “pure”, domain-independent machine

learning and natural language processing. However, many current systems in the

Life Sciences use only very little linguistic information, i.e., typically only word stems

or part-of-speech tags. This may lead to misinterpretations of generated evidence,

since, for instance, negations and subject– relationships are ignored. Using

more linguistic information is therefore an obvious possibility to improve systems,

especially as tools for generating such information in principle are available in the

NLP community. However, such attempts sometimes report disappointing results.

The reasons for this finding are diverse, including parsers lacking accuracy or insuffi-

Dagstuhl seminar proposal „ Ontologies and Text Mining for Life Science“ 4/5

cient adaptation of the extraction techniques to the representation of information in

the text.

The second day of the seminar gave room to presentations on reports on technical

advances in Text Mining systems and applications. Named Entity Recognition, a hot

topic in the core of Text Mining for years now, was in the focus of talks by Ted

Briscoe (ComputerLab, Cambridge, U.K.), Peter Murry-Rost (University of Cambridge,

U.K.) and Martin Hofmann-Apitius (Fraunhofer SCAI, Bonn, D).

Ted Briscoe reported promising results on improving the accuracy of recognizing

names of fly genes in text, a notoriously difficult task. The other two speakers presented

latest results from applying Text Mining to chemical entities, which, in particular,

include the analysis of images in text to recover chemical structures. Advances in

systems for relationship extraction were presented by Goran Nenadic (University of

Manchester, U.K.) and Jung-Jae Kim (EBI, Hinxton, Cambridge, U.K.). A system covering

a particular important area, the resolution of anaphora in text, was shown by Su

Jiang (Infocomm, Singapore). Notably, this system is also available as web service to

be included in world-wide distributed Text Mining pipelines.

مارسا12